Corruption-tolerant bandit learning

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bandit Problems and Online Learning

In this section, we consider problems related to the topic of online learning. In particular, we are interested in problems where data is made available sequentially, and decisions must be made or actions taken based on the data currently available. This is to be contrasted with many problems in optimization and model fitting, where the data under consideration is available at the start. Furthe...

متن کامل

Q-Learning for Bandit Problems

Multi-armed bandits may be viewed as decompositionally-structured Markov decision processes (MDP's) with potentially very large state sets. A particularly elegant methodology for computing optimal policies was developed over twenty ago by Gittins Gittins & Jones, 1974]. Gittins' approach reduces the problem of nding optimal policies for the original MDP to a sequence of low-dimensional stopping...

متن کامل

Bandit Learning with Positive Externalities

Many platforms are characterized by the fact that future user arrivals are likely to have preferences similar to users who were satisfied in the past. In other words, arrivals exhibit {\em positive externalities}. We study multiarmed bandit (MAB) problems with positive externalities. Our model has a finite number of arms and users are distinguished by the arm(s) they prefer. We model positive e...

متن کامل

Local Bandit Approximation for Optimal Learning Problems

In general, procedures for determining Bayes-optimal adaptive controls for Markov decision processes (MDP's) require a prohibitive amount of computation-the optimal learning problem is intractable. This paper proposes an approximate approach in which bandit processes are used to model, in a certain "local" sense, a given MDP. Bandit processes constitute an important subclass of MDP's, and have ...

متن کامل

Satisficing in Time-Sensitive Bandit Learning

Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action. One shortcoming is that this orientation does not account for time sensitivity, which can play a crucial role when learning an optimal action requires much more information than near-optimal ones. Indeed, popular approaches such as upper-confidence-bound methods and Thompson samplin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2018

ISSN: 0885-6125,1573-0565

DOI: 10.1007/s10994-018-5758-5